GECCO 2004 CD-ROM (LNCS 3102)

Metaheuristics for Natural Language Tagging

Lourdes Araujo¹, Gabriel Luque², and Enrique Alba²

¹Dpto. Sistemas Informáticos y Programación, Facultad de Informática, Univ. Complutense, 28040 Madrid, SPAIN
lurdes@sip.ucm.es

²Dpto. de Lenguajes y Ciencias de la Computación, E.T.S. Ingeniería Informática, Campus Teatinos, 29071, Málaga, SPAIN
eat@lcc.uma.es
gabriel@lcc.uma.es

Abstract. This work compares different metaheuristics techniques applied to an important problem in natural language: tagging. Tagging amounts to assigning to each word in a text one of its possible lexical categories (tags) according to the context in which the word is used (thus it is a disambiguation task). Specifically, we have applied a classic genetic algorithm (GA), a CHC algorithm, and a Simulated Annealing (SA). The aim of the work is to determine which one is the most accurate algorithm (GA, CHC or SA), which one is the most appropriate encoding for the problem (integer or binary) and also to study the impact of parallelism on each considered method. The work has been highly simplified by the use of MALLBA, a library of search techniques which provides generic optimization software skeletons able to run in sequential, LAN and WAN environments. Experiments show that the GA with the integer encoding provides the more accurate results. For the CHC algorithm, the best results are obtained with binary coding and a parallel implementation. SA provides less accurate results than any of the evolutionary algorithms.

LNCS 3102, p. 889 ff.

Full article in PDF